Large-scale epigenome imputation improves data quality and disease variant enrichment

نویسندگان

  • Jason Ernst
  • Manolis Kellis
چکیده

With hundreds of epigenomic maps, the opportunity arises to exploit the correlated nature of epigenetic signals, across both marks and samples, for large-scale prediction of additional datasets. Here, we undertake epigenome imputation by leveraging such correlations through an ensemble of regression trees. We impute 4,315 high-resolution signal maps, of which 26% are also experimentally observed. Imputed signal tracks show overall similarity to observed signals, and surpass experimental datasets in consistency, recovery of gene annotations, and enrichment for disease-associated variants. We use the imputed data to detect low quality experimental datasets, to find genomic sites with unexpected epigenomic signals, to define high-priority marks for new experiments, and to delineate chromatin states in 127 reference epigenomes spanning diverse tissues and cell types. Our imputed datasets provide the most comprehensive human regulatory annotation to date, and our approach and the ChromImpute software constitute a useful complement to large-scale experimental mapping of epigenomic information. Users may view, print, copy, and download text and data-mine the content in such documents, for the purposes of academic research, subject always to the full Conditions of use:http://www.nature.com/authors/editorial_policies/license.html#terms Correspondences should be addressed to J.E. ([email protected]) or M.K. ([email protected]). Author Contributions J.E. and M.K. developed the method, analyzed the results, and wrote the paper. Competing Financial Interests The authors declare no competing financial interests. Availability of Imputed Signal Data, Imputation based Peak Calls and Chromatin States, and ChromImpute software All imputed signal datasets and peak calls and chromatin states based on imputed data are available from http://compbio.mit.edu/ roadmap. The ChromImpute software is available at http://www.biolchem.ucla.edu/labs/ernst/ChromImpute and source code is maintained at https://github.com/ernstlab/ChromImpute. HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2015 October 01. Published in final edited form as: Nat Biotechnol. 2015 April ; 33(4): 364–376. doi:10.1038/nbt.3157. A uhor M anscript

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Across-Platform Imputation of DNA Methylation Levels Incorporating Nonlocal Information Using Penalized Functional Regression.

DNA methylation is a key epigenetic mark involved in both normal development and disease progression. Recent advances in high-throughput technologies have enabled genome-wide profiling of DNA methylation. However, DNA methylation profiling often employs different designs and platforms with varying resolution, which hinders joint analysis of methylation data from multiple platforms. In this stud...

متن کامل

Imputation-based Assessment of Next Generation Rare Exome Variant Arrays

A striking finding from recent large-scale sequencing efforts is that the vast majority of variants in the human genome are rare and found within single populations or lineages. These observations hold important implications for the design of the next round of disease variant discovery efforts-if genetic variants that influence disease risk follow the same trend, then we expect to see populatio...

متن کامل

Systematic assessment of imputation performance using the 1000 Genomes reference panels

Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays. By leveraging genotype data from 90 whole-genome deeply sequenced in...

متن کامل

An Evaluation Framework for Privacy-Preserving Record Linkage

Linking data from multiple sources enables more sophisticated analysis and data mining by improving the quality of data through the identification and resolution of conflicting data values, the enrichment of data, and the imputation of missing values [30]. The analysis of integrated data can, for example, facilitate the detection of adverse drug reactions in particular patient groups, or enable...

متن کامل

EGO: A Biomedical Ontology for Integrative Epigenome Representation and Analysis

Epigenomics is crucial to understand biological mechanisms beyond genome DNA. To better represent epigenomic knowledge and support data integration, we developed a prototype Epigenome Ontology (EGO). EGO top level hierarchy and design pattern are provided with a use case illustration. EGO is proposed to be used for statistically analyzing enriched epigenomic features based on given sequence dat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 33  شماره 

صفحات  -

تاریخ انتشار 2015